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Abstract — In the private matching problem, a client and a 
server each hold a set of n input elements. The client wants to 
privately compute the intersection of these two sets: he learns 
which elements he has in common with the server (and nothing 
more), while the server gains no information at all. In certain 
applications it would be useful to have a private matching 
protocol that reports a match even if two elements are only 
similar instead of equal. Such a private matching protocol is 
called fuzzy, and is useful, for instance, when elements may be 
inaccurate or corrupted by errors. 

We consider the fuzzy private matching problem, in a semi- 
honest environment. Elements are similar if they match on t 
out of T attributes. First we show that the original solution 
proposed by Freedman et al. |1| is incorrect. Subsequently we 
present two fuzzy private matching protocols. The first, simple, 
protocol has bit message complexity 0(n(V) (TTog \D\ + k)). 
The second, improved, protocol has a much better bit message 
complexity of 0(nT(log \D\ + k)), but here the client incurs a 
0(n) factor time complexity. Additionally, we present protocols 
based on the computation of the Hamming distance and on 
oblivious transfer, that have different, sometimes more efficient, 
performance characteristics. 

Index Terms — fuzzy matching, secure 2-party computation, 
secret sharing 

I. Introduction 

In the private matching problem [1|, a client and a server 
each hold a set of elements as their input. The size of the set 
is n and the type of elements is publicly known. The client 
wants to privately compute the intersection of these two sets: 
the client learns the elements it has in common with the server 
(and nothing more), while the server obtains no information 
at all. 

In certain applications, the elements (think of them as words 
consisting of letters, or tuples of attributes) may not always 
be accurate or completely known. For example, due to errors, 
omissions, or inconsistent spelling, entries in a database may 
not be identical. In these cases, it would be useful to have a 
private matching algorithm that reports a match even if two 
entries are similar, but not necessarily equal. Such a private 
matching is called fuzzy, and was introduced by Freedman et 
al. 0"). Elements are called similar (or matching) in this 
context if they match on t out of T letters at the right locations. 

Fuzzy private matching (FPM) protocols could also be 
used to implement a more secure and private algorithm of 
biometric pattern matching. Instead of sending the complete 
template corresponding to say a scanned fingerprint, a fuzzy 



private matching protocol could be used to determine the 
similarity of the scanned fingerprint with the templates stored 
in the database, without revealing any information about this 
template in the case that no match is found. 

All known solutions for fuzzy private matching, as well as 
our own protocols, work in a semi-honest environment. In this 
environment participants do not deviate from their protocol, 
but may use any (additional) information they obtain to their 
own advantage. 

Freedman et al. 0~| introduce the fuzzy private matching 
problem and present a protocol for 2-out-of-3 fuzzy private 
matching. We show that, unfortunately, this protocol is incor- 
rect (see Section IHIl i: the client can "steal" elements even if 
the sets have no similar elements in common. 

Building and improving on their ideas, we present two 
protocols for t-out-of-T fuzzy private matching (henceforth 
simply called fuzzy private matching or FPM for short). The 
first, simple, protocol has time complexity 0(n(T)) and bit 
message complexity 0(n(T)(T log |_D| + fe) ) (protocol O. The 
second protocol is based on linear secret sharing and has 
a much better bit message complexity 0(nT(\og\D\ + k)) 
(protocol 0. Here the client incurs a 0(n 2 (^)) time com- 
plexity penalty. Note that this is only a factor n worse than 
the previous protocol. We also present a simpler version 
of protocol [5] (protocol |4]i to explain the techniques used 
incrementally. This protocol has a slightly worse bit message 
complexity. 

Note that, contrary to intuition, fuzzy extractors and secure 
sketches (|2|) cannot be used to solve fuzzy private matching 
problem. 

Indyk and Woodruff [3] present another approach for solv- 
ing fuzzy private matching, using the computation of the Ham- 
ming distance together with generic techniques like secure 2- 
party computations and oblivious transfer. Generic multi-party 
computation and oblivious transfer are considered not to be 
efficient techniques. Therefore, based on the protocol from 
0, we design protocols based on computation the Hamming 
distance that do not use secure 2-party computation. One 
protocol is efficient for small domains of letters (protocol 
Inversion 1) and the second protocol uses oblivious transfer 
(protocol|6]version 2). The major drawback of the first protocol 
is a strong dependence on the size of the domain of letters. The 
main weakness of the second protocol is its high complexity 
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1 For the sake of simplicity time complexities are given roughly in numbers 
of efficient operations (e.g., secret sharing's reconstructions, encryptions, 
polynomial's evaluations etc.); we also report here only the complexity of 
the slowest participant 

2 the authors of the paper do not give exact complexity in the O notation. 

3 protocol with subroutine from first paragraph of section IVI-AI 

4 protocol with subroutine equality-matrix from Figure 7. 

Fig. 1. Results overview 



- in the protocol there are n 2 ■ T oblivious transfer calls. We 
present these protocols mainly to show that other approaches 
to solve the fuzzy private matching problem exist as well. 

We compare our protocols to existing solutions using sev- 
eral complexity measures in Table Q] One of these com- 
plexity measures is the O notation used for the bit mes- 
sage complexity in (3J. This notation is defined as fol- 
lows. For functions / and g, we write / = 0(g) if 
f(n,k) = o(^g(n,k) log 0( ' 1 \n) ■ poly(fc) ^j, where k is 
the security parameter. This notation hides certain factors like 
a strong dependence on the security parameter k (e.g. k 3 ), and 
is therefore less accurate than the standard big-0 notation. We 
prefer this measure for the plain message complexity, where 
we restrict the bit size of the messages to be linear in k. 

Related work can be traced back to private equality test- 
ing (5), Gl, [6| in the 2-party case, where each party has 
a single element and wants to know if they are equal (without 
publishing these elements). Private set intersection HI, (6), 
(possibly among more than two parties) is also related. 
In this problem the output of all the participants should be 
the intersection of all the input sets, but nothing more: a 
participant should gain no knowledge about elements from 
other participant's sets that are not in the intersection. 

Similarly related are the so called secret handshaking pro- 
tocols Si iflOl . They consider membership of a secret 
group, and allow members of such groups to reliably iden- 
tify fellow group members without giving away their group 
membership to non-members and eavesdroppers. We note that 
the (subtle) difference between secret handshaking and set- 
intersection protocols lies in the fact that a set-intersection 
protocol needs to be secure for arbitrary element domains 
(small ones in particular), whereas group membership for 



handshaking protocols can be encoded using specially con- 
structed secret values taken from a large domain. 

Privacy issues have also been considered for the approxima- 
tion of a function / among vectors owned by several parties. 
The function / may be Euclidean distance ( ifTTI . fl2l . @), 
set difference (0]), Hamming distance ( IfTTI . (3)), or scalar 
product (reviewed in lfl3l ). 

Our paper is structured as follows. We formally define the 
fuzzy private matching problem in Section HU and introduce 
our system model, some additional notation, and primitives 
there as well. Then in Section [III] we present the solution 
from |I] for 2-out-of-3 fuzzy private matching and show where 
it breaks down. Section [IV] contains our first protocol for t- 
out-of-T fuzzy private matching that uses techniques similar 
to the ones used in [1 1. Then we present our second protocol 
based on linear secret sharing in Section [V] Finally, Section 
IVTl presents two protocols based on the computation of a 
Hamming distance. All our protocols assume a semi-honest 
environment (see Section E-Bl . 

II. Preliminaries 

In this section, we introduce the fuzzy matching problem 
as well as the mathematical and cryptographic tools that we 
use to construct our protocols. 

A. Fuzzy Private Matching Problem Definition 

Let a client and a server each own a set of words. A fuzzy 
private matching protocol is a 2-party protocol between a client 
and a server, that allows the client to compute the fuzzy set 
intersection of these sets (without leaking any information to 
the server). 

To be precise, let each word X = x 1 . . . x T in these sets 
consist of T letters x % from a domain D. Let X = x 1 . . . x T 
and Y = y 1 . . . y T . We define X « t Y (X and Y match on t 
letters) if and only if t < \{k : x k = y k n (1 < k < T)}\. 

The input and the output of the protocol are defined as 
follows. The client input is the set X = {Xi, . . . X nc ,} of 
nc words of length T, while the server's input is defined as 
Y = {Yi, . . . Y ns } of n s words of length T. Both the client 
and the server have also in their inputs nc, ns, T and t. The 
output of the client is the set {Yi e Y\3X t £!:!,«( Y 3 }. 
This set consist of all the elements from Y that match with 
any element from the set X. The server's output is empty 
(the server does not learn anything). Usually we assume that 
n c — ns — n. In any case, the sizes of the sets are fixed and 
a priori known to the other party (so the protocol does not 
have to prevent the other party to learn the size of the set). 

B. Adversary Models 

We prove correctness of our protocols only against com- 
putationally bounded (with respect to a security parameter k) 
and semi-honest adversary, meaning the the parties follow the 
protocol but may keep message histories in an attempt to learn 
more than is prescribed. Here we provide the intuition and the 
informal notion of this model, the reader is referred to lfT4l 



for full definitions. To simplify matters we only consider the 
case of only two participants, the client and the server. 

We have chosen the semi-honest model for a few reasons. 
First of all, there had not been made any "really" efficient 
solution for FPM problem in any model. Secondly, our proto- 
cols seem to be secure against malicious clients and the only 
possible attacks are on the correctness of the protocols by 
malicious servers. Moreover in [15], [16), ifTTl . it is shown 
how to transform a semi-honest protocol into a protocol 
secure in the malicious model. Further, ifTTIl does this at a 
communication blowup of at most a small factor of poly{k). 
Therefore, we assume parties are semi-honest in the remainder 
of the paper (however we are aware that the mentioned generic 
transformations are not too efficient). 

We leave improving protocols to work efficiently in ma- 
licious environment and proofs that the protocols from this 
paper are secure against malicious clients for future work. 

In the model with a semi-honest adversary, both parties 
are assumed to act accordingly to the protocol (but they 
are allowed to use all information that they collect in an 
unexpected way to obtain extra information). The security 
definition is straightforward in our particular case, as only one 
party (the client) learns the output. Following [1| we divide 
the requirements into: 

• The client's security - indistinguishably: Given that the 
server gets no output from the protocol, the definition of 
the client's privacy requires simply that the server cannot 
distinguish between cases in which the client has different 
inputs. 

• The server's security - comparison to the ideal model: 

The definition ensures that the client does not get more or 
different information than the output of the function. This 
is formalized by considering an ideal implementation 
where a trusted third party TTP gets the inputs of the two 
parties and outputs the defined function. We require that 
in the real implementation of the protocol (one without 
TTP) the client does not learn different information than 
in the ideal implementation. 

Due to space constraints our proofs are informal, presenting 
only the main arguments for correctness and security. 

C. Additively Homomorphic Cryptosystem 

In all our protocols we use a semantically secure, additively 
homomorphic public-key cryptosystem, e.g., Paillier's cryp- 
tosystem 1181 . Let {-}k denote the encryption function with 
the public key K. The homomorphic cryptosystem supports 
the following two operations, which can be performed without 
the knowledge of the private key. 

1) Given the encryptions {o]k and of a and b, one 
can efficiently compute the encryption of a + b, denoted 
{a + b} K := {a} K +h {b}i< 

2) Given a constant c and the encryption {cl}k, of a, one 
can efficiently compute the encryption of c ■ a, denoted 
{a ■ c} K := {a} K - h c 



These properties hold for suitable operations +h and -h defined 
over the range of the encryption function. In Paillier's system, 
operation +h is a multiplication and is an exponentiation. 

1) Remark: The domain R of the plaintext of the homo- 
morphic cryptosystem in all of our protocols (unless specified 
differently) is defined as follows: R should be larger than D T 
(or in some protocols D) and a uniformly random element 
from R should be in D T (or D) with negligible probability. 
This property can be satisfied by representing an element 

G D T (or in some protocols a 6 D) by r a — ||a in 
R. The domain R should be a field (e.g., "L q for some prime 
<?)■ 

2) Operations on encrypted polynomials: We represent any 
polynomial p of degree n (on some ring) as the ordered list 
of its coefficients: [ao, a.\, . . . a n ]. We denote the encryption 
of a polynomial p by {p}k and define it to be the list of 
encryptions of its coefficients: [{ao}if > { a i}K, ■ ■ ■ {otn}i<}- 

Many operations can be performed on such encrypted poly- 
nomials like: addition of two encrypted polynomials or multi- 
plication of an encrypted and a plain polynomial. We use the 
following property: given an encryption of a polynomial {p}k 
and some x one can efficiently compute a value {p(x)}k- This 
follows from the properties of the homomorphic encryption 
scheme: 

f n \ n n 

l i = ) K i=0 i=0 

D. Linear Secret Sharing 

Some of our protocols use t-out-of-T secret sharing. The 
secret s is split into T secret shares s\ such that any combi- 
nation of at least t such shares can be used to reconstruct s. 
Combining less than t individual shares gives no information 
whatsoever about the secret. 

A Linear i-out-of-T Secret Sharing (LSS) scheme is a secret 
sharing scheme with the following property: given t shares 
s l (of secret s), and t shares r l (of secret r) on the same 
indices, using If +r* one can reconstruct the sum of the secrets 
s+f. One such LSS scheme is Shamir's original secret sharing 
scheme llT9l . 

III. The Original FPM Protocol 

Freedman et al. [ 1 1 proposed a fuzzy private matching 
protocol for the case where T = 3 and t = 2 (see Figure |2). 
Unfortunately, their protocol is incorrect. 

1 ) The idea behind, and the problem of the protocol from 
Figure |2} Intuitively the protocol works because if Xi «2 Yj 
then, say, x\ = and x\ = y|. Hence Pi(xT) = PziVj) = r i 
and P 3 (xf) = P 3 (y|) = r< so P 2 (yf) - P 3 (yf) = 0. Then the 
result {r' ■ {P^ylj ) — P 3 (y^)) + Yj}k sent back by the server 
simplifies to {Yj}x (the random value r' is canceled by the 
encryption of 0) which the client can decrypt. If Xi and Yj do 
not match, the random values r, r' and r" do not get canceled 
and effectively blind the value of Yj in the encryption, hiding 
it to the client. 



1) The client chooses a private key sk, a public key K and parameters 
for the additively homomorphic encryption scheme and sends K and 
the parameters to the server. 

2) The client: 

a) chooses, for every i (such that 1 < i < np), a random value 

Ti e P. 

b) creates 3 polynomials: Pi, P2, P3 over R (where polynomial Pj is 
used to encode all letters on the jth position) defined by the set of 
equations 

n = Pi(xj) = P 2 (x?) = P 3 (xf ), for 1 < i < no- 

c) uses interpolation to calculate coefficients of the polynomials 
(Pi, P2, P3) and sends their encryptions to the server. 

3) For each Yj (such that 1 < j" < ng), the server responds to the client: 
{r ■ (Pifej) - P 2 (^ 2 )) + Y S } K , {r' ■ (P 2 fe|) - Pay*)) + Y,} K , 
{r" ■ {Pi(yj) — Psiv'j)) + Yj}[(, where r,r',r" are fresh random 
values in R. This uses the properties of the homomorphic encryption 
scheme including the encrypted polynomials explained in Section llI-C2l 

4) If the client receives an encryption of an encoding of Yi, which is 
similar to any word from his set X, then he adds it to the output set. 

Fig. 2. Original FPM protocol 

There is however a problem with this approach. Consider 
the following input data. The input of the client is {[1,2,3] 
, [1,4,5]}, while the input of the server is {[5,4,3]}. Then 
in step [2c] of the protocol, the polynomials are defined (by 
the client) in the following way: Pi(l) = r\ PI Pi(l) = r 2 , 
P 2 (2) = n n P 2 (4) = r 2 and P 3 (3) = n n P 3 (5) - r 2 . 
But now we see that, unless r\ — r 2 (which is unlikely 
when they are both chosen at random), P\ remains undefined! 
Freedman et al. do not consider this possibility. However, if 
we try to remedy this problem by setting r\ = r 2 we run 
into another one. Among other things, the server computes 
{r' ■ (P 2 (yf) — Pz{y^)) + Yi}x, which, in this particular 
case equals {r' ■ (P 2 (4) - P 3 (3)) + [5,4,3]}*. This equals 
{r 1 ■ (r 2 — ri) + [5, 4, 3]}*, which by equality of v\ and r 2 re- 
duces to {[5, 4, 3]}*. In other words, the client learns [5, 4, 3] 
even if this value does not match any of the elements held by 
the client. This violates the requirements of the fuzzy private 
matching problem: if a semi-honest client happens to own a 
set of tuples with a property similar to the counterexample 
above, it learns a tuple of the server. 

IV. A Polynomial Based Protocol 

The protocol of the previous section can be fixed, but in 
a slightly more elaborate way. Our solution works for any T 
and t, and is presented in Figure [3] In the protocol we use 
the following definition. Let a be a combination of t different 
indices o\ , cr 2 , . . . , 0* from the range {1,...,T} (there are ( T t ) 
of those). For a word X 6 D T , define a(X) = x ai \ \ ■ ■ ■ \\x at 
(i.e., the concatenation of the letters in X found at the indices 
in the combination). We now discuss the correctness, security 
and complexity of this protocol. 

2) Correctness: In the protocol, the client produces (^) 
polynomials P a of degree nc- Every polynomial represents 
one of the combinations a of t letters from T letters. In fact, 
the roots of the polynomial P a are a(Xi) It is easy to see 
that if X «f Y then a(X) = <r(Y) for some combination <j. 
Hence, if Xi « t Yj then P a (a{Yj)) — f° r some P„ received 
and evaluated in step [3a] When that happens, the encryption 



1) The client chooses a private key sk, a public key K and parameters 
for the additively homomorphic encryption scheme and sends K and 
the parameters to the server. 

2) For every combination <r of t out of T indices the client: 

a) constructs a polynomial: 

P a {x) = (x-a(Xi)) ■ (x — a(X2)) ■ • ■ (x — a(X na )) of degree 
riQ with domain D T and range R. 

b) sends {P CT }^- (the encrypted polynomial) to the server. 

3) For every Yi £ Y, 1 < i < n$, and every received polynomial 
{P CT }jf (corresponding to the combination a) the server: 

a) evaluates polynomial {P CT }jf at the point cr(Y i ) to compute 
{ w ?}k = {r * Pcr(cr(Yi)) + Yi}x, where r 6 R is always a 
fresh random value. 

b) sends {uif }if to the client. 

4) The client decrypts all received messages. If for such a decryption 
wf «t Xj for any Xj € X , then he adds wf to the output set. 

Fig. 3. Polynomial Based Protocol solving FPM problem 

of Yj is sent to the client. Later on, the client can recognize 
this value by the convention that values in D T are represented 
in R using a fc prefix. Otherwise (if Yj does not match with 
any element from X) all the values sent to the client contain 
a random blinding element r (and therefore their decryptions 
are in Y with negligible probability). 

3) Security: The client's input data is secure because all the 
data received by the server are encrypted (using a semantic ally 
secure cryptosystem). Hence the server cannot distinguish 
between different client's inputs. The privacy of the server is 
protected because the client only learns about those elements 
from Y that are also in X, and because (by semi-honesty) it 
does not send specially constructed polynomials to cheat the 
server. If an element yi £ Y does not belong to X then a 
random value is sent by the server (see the correctness proof 
above). 

4) Complexity: The messages being sent in this protocol are 
encryptions of plaintext from the domain R, i.e., 0(T log |-D| + 
k) bits. In step [2] the client sends (T) polynomials of degree 
nc (sending each coefficient separately). Then in step [3] the 
server responds with ns values for every polynomial. Hence 
in total 0((ns +^c) • (T)) messages are sent. Therefore, the 
total bit complexity is 0((n s + n c ) ■ ( T t ) ■ (Tlog |D| + k)). 

The time complexity is the same as the number of messages 
in protocol 0{(ns + nc) ■ (T)). 

V. Secret Sharing Based Protocols 

The number of messages sent in the previous protocol is 
very large. Therefore, we now present two protocols solving 
the FPM problem based on linear secret sharing that trade 
a decrease in message complexity for an increase in time 
complexity. Both work in the model with a semi-honest 
adversary. First we describe the simple (but slow) protocol 
and later the faster, improved one. We present the simple 
version mainly to facilitate the understanding of the improved 
protocol. 

A. A Simple Version of the Protocol 

The simple protocol is presented in Figure |4] The idea 
behind the protocol is the following. The server encrypts all 



1) The client generates sk, K and parameters for the additively homo- 
morphic cryptosystem and sends K and the parameters to the server. 

2) For each Xi & X 

a) The client encrypts each letter xf of Xi and sends {x^}k to the 
server. 

b) For each Yj £ Y, run the protocol f ind-matching(ij). 
find-mat ching(i,j): 

1) The server generates skj and parameters for the symmetric cryptosys- 
tem and sends parameters to the client. 

2) The server sends yj = E s y., (Yj) to the client. 

3) The server prepares t-out-of— T secret shares [s 1 , s 2 , . . . ~s T ] with 
secret O k \\skj, where k is the security parameter. 

4) For every letter yf in Yj, the server computes: 

= ~h {ypK) -h r) +h {* w }k which equals 

{((xf — yf) ■ r + s™)}/f , where r is always a fresh, random value 
from the domain of plaintext. 

5) The server sends [vi, V2, ■ ■ ■ «t] to the client. 

6) The client decrypts the values and checks whether it is possible to 
reconstruct the secret Q k \\z from them. In order to do that, he needs to 
try all possible combinations of t among the T decrypted (potential) 
shares. If it is possible and Dec z (y~j) « t Xi then he adds Dec z (y~j) 
to his output set. 

Fig. 4. Simple secret sharing protocol solving FPM problem 

its words Yj using separate symmetric keys skj and sends the 
results to the client. The protocol then proceeds to reveal key 
skj to the client only if there is a word Xi such that Xi »t Yj. 

Every word Xi of the client is matched with each word Yj 
of the server one by one. To this end, the client first sends 
each letter of Xj to the server, encrypted to the public key of 
the server separately. 

Upon reception of the encrypted letters for Xi, the server 
does the following for each word Yj in his set (using the 
subroutine f ind-matching(i,j)). Firstly the server prepares 
secret key (skj for corresponding word Yj) for the symmetric 
encryption scheme (e.g., AES), and sends the encrypted Yj to 
the client. Then it prepares i-out-of-T random secret shares 
s , . . . , s T such that s = O fe ||sfcj. Share s l is "attached" to the 
i-th letter of word Yj, so to speak. Note that each time a new 
word Xj from the client is matched with Yj, fresh secret shares 
are generated to avoid an attack similar to the one described 
in section Hill 

Using the homomorphic properties of the encryption 
scheme, the server then computes for each encrypted letter 
{xf} K it received, the value v w = {((xf - yf) ■ r + s w )} K 
(using a fresh random value r each time, and encrypting yf 
to the public key K). Note that v w — {~s w }k if and only if 
■ L i yj ■ 

Finally, the server sends Vi , . . . , Vt back to the client. The 
client decrypts these values, and if X{ ~t Yj, then by the 
observation in the previous paragraph, among the decrypted 
values there are at least t shares S 1 " from which skj and 
therefore Yj can be reconstructed. 

Due to space constraints we skip the proofs of correctness 
and security of the protocol from Figure H] (they can be found 
in the appendix). 

1) Complexity: Two kinds of messages are sent in this 
protocol. Messages encrypted by homomorphic encryption 
scheme are from the domain 0(log \D\ + k) bits. The second 



kind of messages are the messages encrypted by the symmetric 
encryption scheme (they are sent in step [2] of the subroutine). 
They are encryptions of plaintext from the domain D T . 

The main impact on the message complexity of the protocol 
is the fact that the subroutine find-matching is called 
ncns times. In this subroutine, the server sends 0(T) cipher- 
texts in step [5] . Hence, in total 0(ncnsT) messages of size 
0(log \D\ + k) and 0(ns) messages of size 0(log |Z?| T + k) 
are sent in this protocol. Therefore, the bit complexity of the 
protocol is: 0{n c n s T(\og \D\ + k) + ri s (log |L>| T + k)) = 
0{n c n s T{\og\D\ + k)). 

We see that by first encrypting the words stored by the 
server using symmetric keys, and later using the secret sharing 
mechanism to reveal these keys instead of the full words, 
changes the bit complexity from 0(T(log|Z?| T + fc)) to 
0(T(log \D\ + k)), removing a factor T. 

The server prepares nsnc times the T secret shares. Pro- 
ducing T secret shares can be done efficiently and therefore 
the time complexity of the server is reasonably low. The client 
(in step|6]for each subroutine call) verifies if he can reconstruct 
the secret Yj. This verification costs (T) reconstructions (and 
one reconstruction can be done efficiently). The number of 
reconstructions is in the order of 0(nsnc(T)), which is the 
major drawback of this protocol. 

B. An Improved Protocol 

We can improve the message complexity by combining the 
idea of using secret sharing (protocol |4|i with the idea of 
encoding all characters at position w using a polynomial P w 
(protocol 13. The resulting protocol for FPM is presented in 
Figure [5] It consists of two phases: a polynomial phase, and 
a ticket phase. 

The polynomial phase runs as follows. As in the previous 
protocol, words are first sent encrypted to the client, while the 
key skj is encoded using a secret sharing scheme such that 
when the client has a word matching on letter w, it obtains 
share ~s~j w . 

However, we now encode the shares at letter position w 
using a polynomial P w defined by 

(P w (yf) = sT") n (P w (yT) = st w ) n. . .n (P w (yT l ) = s^ w ) 

(where, for technical reasons, at least random point is added 
to ensure privacy in the case xf ^ yj). This polynomial is 
sent to the client to allow him to recover share si™ for each 
letter xf = yV. In fact, it is sent encrypted to the client; more 
about this later. 

We need to avoid the problem discussed in section Hill with 
the original FPM protocol. Observe that the above definition 
of P w is only valid if we require that s^ 1 " = ~s~j w whenever 
yf = yf . This means that, as we proceed through to the list 
of words Yj of the server constructing secret shares for key 
skj, we accumulate restrictions on the possible share values 
we can use. In the extreme case, for some word Yj, T shares 
could already be fixed! If T was the total number of shares, 
then skj would be fixed and we would have the same leakage 
of information discussed in section [Til] 



Polynomial Phase: 

1) The server prepares sk, K and parameters for the additively homo- 
morphic cryptosystem and sends K and the parameters to the client. 

2) For all Yj £ Y, the server generates skj and parameters for the 
symmetric cryptosystem and sends parameters to the client. Later the 
server sends y~j = E s ^ . (O fe ||Yj) to the client. 

3) For all Yj G Y, the server prepares [T + l]-out-of-[2 ■ T - t + 1] 
secret shares [sj 1 ,;^" 2 , . . . sy 2 T_t + 1 ] with the secret O k \\skj, where 
fc is the security parameter. If yV = j/™ then ¥7™ = s m M . 

The server sends [sJ T+1 , . . . sJ 2 T_t+1 ] to the client. 

4) The server prepares T polynomials (for to = 1 to T) of degree n : 

a) The polynomial is defined in the following way: 

((P w (vf) = sT™)n(P™(y™) = T^ w )n... {P w (y™) = 3^°)) 
The number of points is increased to n + 1 by adding random points 
(at least one random point is added). 

b) The server computes the coefficients of the polynomials and en- 
crypts each polynomial {P w }k and sends it to the client. 

5) The client evaluates T polynomials (for w = 1 to T) on each letter of 
each word (for i = 1 to n): {vf} K = {P w (xf)}K. If xf = 
then«f=s^. 

6) The client blinds the results v™ with a random values rf and sends 
them to the server: {v™ + rj"}x- 

Ticket Phase: 

6) For i = 1 to n, the server prepares [T + l]-out-of-[2 ■ T — t + 1] 
secret shares [tv 1 , tJ 2 , . . . ii 2 ' T ~ t+1 ] with secret 0. Later he sends 
[n T+1 , . . ,tv" 2 ' t -* +1 ] to the client. 

7) For i = 1 to n and for w = 1 to T, the server decrypts the received 
messages D a ^ ({v ™ ) and sends (t^" +rj" + tT™) to the client. 

8) The client unblinds them (by subtracting rf) obtaining q™ . 
If x? = < then qf = + tT™. 

9) For i = 1 to n and j = 1 to n, the client checks if it is 

possible to reconstruct the secret fe 1 1 z from: [q] , q\ , ... qf , sJ T+ 1 + 
_ r+Ij — T+a + -t+2^ —2-T-t+i + _2.r-t + i^ 

In order to do that, the client needs to try all possible combinations 
of t shares among the T decrypted q shares (the rest of the shares 
is the same during reconstructions). If it is possible and for any y~j, 
Decz(y~j) = fe ||a, and a matches Xi then he adds a to his output 
set. 

Fig. 5. Improved secret sharing protocol solving FPM problem 



We solve this problem by adding an extra shares ~s] , . . . 
(that are in fact sent to the client in the clear!) and changing 
the parameters of the secret sharing scheme, as follows. We 
observe that if at most T shares can get fixed as described 
above, the best we can do is create a (T + l)-out-of-(T + x) 
scheme. This ensures that an arbitrary skj can actually be 
encoded by the secret sharing scheme, even given T fixed 
shares. The x extra shares are given away "for free" to the 
client. Now to ensure that the client needs at least t letters that 
match word Yj in order to be able to reconstruct skj form the 
shares it receives, we need t = T + 1 — x i.e., x = T + 1 — t. 

In other words, we use a (T+l)-out-of-(2-T+l — t) secret 
sharing scheme where for each word Yj 

• the first T shares are encoded using polynomials 
P 1 ,...,P T , and 

• the remaining T + 1 — t shares are given the client in the 
clear. 

If Xi Ki t Yj, then the client obtains at least t shares using the 
polynomials P 1 , . . . , P T . Combined with the T+ 1 — t shares 
it got for free, it owns at least T + 1 shares that allow it to 
reconstruct the secret. Note, however, that when it obtains the 
shares by evaluating the polynomial for the letters in Xi, it 
does not know to which Yj these shares actually correspond. 
So in fact to actually try to reconstruct the secret, it needs to 
combine these shares with each group of free T + 1 — t shares 
corresponding to Y\ up to Y n one by one. 

This works, but it still leaves the leakage of information 
problem discussed in section [III] when several different words 
held by the client each match on some characters of a word 
Yj held by the client, such that t shares for skj are released 
even though no single word of the client actually matches Yj . 
This problem is solved in the ticket phase, as follows. 

In fact, the polynomials sent by the server to the client 
are encrypted using the homomorphic encryption scheme. 
Therefore, when evaluating the polynomials for a word JQ, the 
client only obtains the encrypted shares corresponding to it. 
These are useless by themselves. The client needs the help of 
the server to decrypt these shares. In doing so, the server will 
enforce that the shares the client receives in the end actually 
correspond to a single word in the client set (and not a mix 
of shares obtained using letters from different words as in the 
attack described in the previous paragraph). 

The server enforces this using so-called tickets (hence the 
name: ticket phase). Tickets are in fact (T+ l)-out-of-(2 - T + 
1 — t) random secret shares for the secret 0. The clients sends 
groups of encrypted shares (blinded by random values) that 
he got for every word Xi to the server. The server, for every 
group of shares received from the client, decrypts these shares 
and adds the tickets shares. The result is sent back to the 
client, who unblinds the result (subtracting the random value). 
Because of the linear property of the secret sharing scheme, 
the secret corresponding to the shares the client receives in the 
end (that are the sum of the original share and the ticket share) 
has not changed. But if the client tries to combine different 
shares obtained form different words, the shares of the tickets 



hidden within them no longer match and reconstruction of the 
secret is prevented. 

Due to space constraints we skip the proofs of correctness 
(that is essentially similar to the discussion above) of the 
protocol from Figure [5] This proof can be found in the 
appendix. 

1) Security: The privacy of the client's input data is secure 
because all of the data received by the server (in step [6] of 
the polynomial phase) is of the form: vf + rf , where rj" is 
a random value from the domain of the plaintext. Hence the 
server cannot distinguish between different client inputs. 

The privacy of the server is protected because the client 
receives correct secret shares of some skj (corresponding to 
Yj G Y) if and only if there is an element Xj G X such 
that Xi Ki t Yj. In the polynomial phase, the client receives 
encrypted polynomials and n groups with T — t + 1 shares 
([s7 T+1 , . . . s-[ 2 ' T - t+1 l] ) of [T+l]-out-of-[2-T-i+l] secret 
sharing scheme. Hence, there is no leakage of information 
in the polynomial phase. The client receives information in 
plaintext in steps [6] and [7] of the ticket phase. In this situation, 
the client has at least T + 1 correct secret shares during step [7] 
and he can reconstruct the secret fc ||sfc m (and therefore, Y m ). 

If there is no such element in X to which Yj is similar, 



1) The client prepares sk, K and the parameters for the additively 
homomorphic cryptosystem and sends K and the parameters to the 
server. 

2) Run subroutine equality-matrix. After this subroutine the server 
has obtained the following matrix: 

where w S {1, . . . T} and i,j £ {1, . . . n} 

3) For each Xi G X and Yj e Y: 

a) the server computes {A(Xi,Yj)} K = {J2Z=l /(*>Ji^)}if and > 
for £ = to T - 1, sends {(A(Xi,Yj) - i) ■ r + (0 k \Yj))} K to 
the client. Here r is always a fresh, random value. 

b) The client decrypts all T — t messages and if any plaintext is in D T 
and matches any word from X, then the client adds this plaintext 
to the output set. 

Fig. 6. Hamming distance based protocol for the FPM problem 



then the client receives no more than t shares in every group 
qi of potential shares: q™ = Ti" + ~sj w (where i is an index of 
the received group of potential shares). The other values (for 
incorrect letters) include P w (y'j) that cannot be determined. 
It is caused by the fact that the client does not know enough 
points (degree of the polynomial is n + 1 and the client can 
know only n points) defining the polynomial and at least 
one unknown point is random. This is exactly the situation 
like in a polynomial based secret sharing scheme when not 
enough shares are known. The client cannot reconstruct skj 
for any group separately (by the secret sharing assumption), 
because he has less than T + 1 correct secret shares. Of all 
the shares, (T — t + 1) come from values that are sent in 
plaintext. For every group of shares, r values are different and 
therefore make every received group of shares independent. 
The probability that a random value from R is a correct 
share is negligible (with respect to a security parameter k). 
Therefore, the probability that the client can recover illicit 
information is negligible. 

2) Complexity: In step [2] the server sends n messages 
encrypted by the symmetric encryption scheme that are from 
the domain 0(log |L>| T + fc) (that is 0(n(T\og \D\+k)) bits). 
Later in step[3]the server sends 0(nT) unencrypted messages 
from the domain 0(k + log \D\) (that is 0(nT(log \D\ + k)) 
bits). In step H] the server sends encryptions of T polynomials 
of degree n. This totals to 0(nT(\og\D\ + k)) bits. For 
every received polynomial, the client computes n values and 
sends them encrypted to the server (again 0(nT(log \D\ + k)) 
bits). In the ticket phase, in step [7] the server sends 0(nT) 
unencrypted messages, that is 0(nT(log \D\ + k)) bits. Hence, 
the bit complexity of the entire protocol totals to: 0(nT(k + 
log \D\) + n(k + log |£>| T )) = 0(nT{k + log \D\)). 

The main part of the server time complexity is preparing 2n 
times [T+ l]-out-of— [2-T— secret shares. Since produc- 
ing (2-T— secret shares can be done efficiently, the time 
complexity of the server is reasonable. The crucial part for the 
time complexity of the client is step [9] (which is performed n 2 
times). In this step the client checks whether he can reconstruct 
the secret Yj. This verification costs (T) reconstructions (and 
one reconstruction can be done efficiently). The total number 
of reconstructions is in the order of 0(n 2 (^)), which is the 
major drawback of this protocol. 

VI. Hamming Distance Based Protocol 

In this section we present two protocols solving the FPM 
problem based on computing the encrypted Hamming dis- 
tance: one that is simple and efficient for small domains 
and another that uses oblivious transfer. The difference be- 
tween them is only the implementation of the subroutine 
equality-matrix (the frame of the protocol is the same 
for both of them). Firstly we describe the simple protocol and 
later the one using oblivious transfer. 

A technique to compute the encrypted Hamming distance to 
solve the FPM problem has been introduced in [3|. However, 
the protocol in that paper uses generic 2-party computations 



together with oblivious transfer, making their approach less 
practical. 

Our protocol (see Figure [6]) works as follows. The server 
first obtains, using the subroutine equality-matrix, a 3- 
dimensional matrix f(w,i,j) containing the encrypted equal- 
ity test for the u>-th letter in words X{ and Yj (where {0}^- 
denotes equality and {1}^ denotes inequality). The server 
sums the entries in this matrix to compute the encrypted 
Hamming distance d\ = A(Xi, Yj) between the words Xi and 
Yj. Subsequently, the server sends Yj blinded by a random 
value r multiplied by d\ - I, for all < i < T - t. If 
< d\ < T — t, then for some I the value Yj is not blinded 
at all. This allows the client to recover Yj. Otherwise Yj is 
blinded by some random value for every I, and the client learns 
nothing. 

3) Correctness and Security of the protocol from Figure® 
Assuming that in the subroutine equality-matrix the 
matrix / has been securely obtained, protocol [6] calculates a 
correct output. This can be concluded from the following facts: 
if Xi « t Yj then (in stepEHi A(X t ,Y 3 ) E { ... T — t }, 
and therefore {0 fc || Y^*}ir is sent to the client. Privacy of the 
server is protected because in step [3a] if JQ 96 1 Yj then 
A(Xi, Yj) $ {0, . . . T — t} and therefore all values received by 
the client look random to him. Correctness and security proofs 
of this protocol resemble the proofs of the protocol presented 
in Figure [4] and are omitted here. 

A. Implementing Subroutine equality-matrix 

The first method to implement the subrou- 
tine equality-matrix is as follows. The client sends 
the letters of all his words to the server as encrypted 
vectors df: {0, . . . \D\ — 1} (where i s {!,... nc] and 
w £ {1,...T}) such that df{v) = {1} K if v = xf, and 
dT( v ) = {®}k otherwise. This process can be described 
as sending encryptions of unary encoding of the letters 
of all his words. Subsequently the server defines the 
matrix as f(w,i,j) = dfd/j 1 ). The main drawback of 
this method is that its bit complexity includes a factor 
0(|D| • n ■ T + n 2 ■ (T — t)). However, the protocol is simple, 
and for small domains D (e.g., ASCII letters) it is efficient. 
For constant size D and T w t the bit complexity of the 



1) The client generates vectors df: [0, . . . \D\— 1] (where i g {1, . . . np} 
and w £ {1, . . . T}) such that: df (v) = 1 if v = xf , and df (v) = 
otherwise. 

2) The matrix / is defined in the following way (for all i,j G {1, . . . n} 
and id e {1, ... T}): 

a) The client picks a random bit bVj ■ 

b) The server and the client perform l-out-of-|D oblivious transfer as 
follows. The client constructs hfj, which is a vector [0, . . . \D\ — 1] 
as follows: 

Kj = K(o) ® bv <if (l) @b% j ,...d?{\D\-i)@ bvj. 

The server wants to obtain a value from the vector hf with an index 
y™ . For that they perform the oblivious transfer protocol (where the 
server has an index and the client an array). Subsequently, the server 
obtains the value h = h w (y w ). 

c) The client sends {of ■ }x to the server. 



., . / {%}if, forh = 

d) /Kt,j) = | { i for ft = 1 



Fig. 7. Subroutine equality-matrix based on oblivious transfer 



protocol reduces to 0(n 2 + n ■ T) (which is significantly 
better than the bit complexity of the protocol from in this 
situation). 

The second implementation of the subroutine is shown 
in Figure [7] This implementation uses 1-out-of-g oblivious 
transfer. An oblivious transfer is a 2-party protocol, where 
a client has a vector of q elements, and the server chooses 
any one of them in such a way that the server does not 
learn more than one, and the client remains oblivious to the 
value the server chooses. Such an oblivious transfer protocol 
is described in [6|. The fastest implementation of oblivious 
transfer works in time O(l). 

The second version of the subroutine equality-matrix 
uses such an oblivious transfer in the following way. Let 
df be the unary encoding of xf as defined above (in the 
description of the first method of implementation). The client 
chooses a random bit bf 4 . Next he constructs a vector hf 4 
which contains all bits of df, each blinded by the random 
bit bf 3 . In other words hfjx] = df(x) © bfj. Using an 
oblivious transfer protocol, the server requests the yf-th entry 
in this vector, and obtains df (yf)®bfy By the obliviousness, 
the client does not learn yf, and the server does not learn 
any other entry. Subsequently, the client sends the encryption 
{bfj}K to the server. Based on this the server constructs 
f(w,i,j) — {df(yf)}K as explained in the protocol. 

1) Corollary: These protocols are in general less efficient 
in bit complexity than the improved protocol based on secret 
sharing (see Section IV-BI Figure |5j. The first protocol is 
efficient for small domains, but significantly less efficient for 
large ones. In the second protocol there are n 2 ■ T oblivious 
transfer calls. Moreover, at this stage, we do not foresee a 
way to improve these protocols. However, the protocols are 
interesting because they do not use generic 2-party computa- 
tions. Furthermore, the techniques being used contain novel 
elements especially in the subroutine equality-matrix, 
that presents a technique for obtaining the encryption of a 
single bit using only one oblivious transfer. 



VII. Summary and Future Work 

In this paper we have presented a few protocols solving 
the FPM problem. The most efficient one works in a linear 
bit complexity with respect to the size of the input data and 
the security parameter. This is a significant improvement over 
existing protocols. The improvement comes at an expense of 
a factor n increase in time complexity (but only at the client). 

Currently, we are investigating how to speed up the time 
complexity of the client by using error correcting coding 
techniques. 
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Appendix 

a) Correctness and security of the protocol from Figure 
@- In this protocol the client encrypts all of letters of all of his 
words (with a unique secret key for every word) and sends the 
results to the server. Then for every couple of words (Xj, Yj), 
the participants run the subroutine find-matching. In the 
subroutine firstly the server encrypts Yj with some random 
secret key skj of symmetric encryption scheme. Later it 
divides skj into T shares (with threshold t) and for every letter 
in Yj calculates v w = {({xf -yf)-r+s w )} K . Ifxf = yf then 
the client receives the correct share, otherwise a random value. 
However, at this step the client cannot distinguish in which 
situation he is (he cannot distinguish a random value from the 
correct share). Then the client checks if he can reconstruct the 
secret key using any combination of t out of the T elements 
{D s k(v w )\l < w < T}. He recognizes the secret key by the 
k prefix, and the fact that decrypted by that secret key value is 
similar with one of the words from his set. If he has less than 
t correct secret shares then he cannot recover the secret key, 
and the retrieved data looks random to him (this follows from 
the security of the secret sharing scheme). Hence all required 
elements from Y appear in the client's output. The probability 
that some incorrect element is in the output set is negligible. 

The client input data is secure because all of the data 
received by the server is encrypted (using the semantically 
secure cryptosystem). Hence the server cannot distinguish 
between different client inputs. 

Privacy of the server is protected because the client receives 
correct secret shares of some Yj £ Y if and only if there is 
an element JQ £ X such that Xi «t Yj. In this situation the 
client has at least t correct secret shares and he can reconstruct 
the secret O fe ||sfcj (and therefore, it can decrypt Yj). If there is 
no element in X to which Yj is similar then the client receives 
n independent groups of shares, which has no group with at 
least t correct shares. Hence from any of these groups he 
cannot retrieve any secret key. The probability that a random 
value from R is a correct share is negligible (with respect to 
security parameter k). Therefore the probability that the client 
can recover an illicit secret is negligible. 

b) Correctness of the protocol from Figure [5} The first 
important issue appears in step [3] of the polynomial phase. 
Here the server prepares n groups of [T + l]-out-of-[2 • T — 
t + 1] shares [sj 1 , sj 2 , . . .sJ 2 ' T ~ t+1 ]. From the jth group he 
can recover skj, and therefore, Yj. During the creation of these 
shares the server uses the rule: 

for w€{l,... T}: if yf = y» then sl w = . (1) 

This rule is necessary because the first T shares from each 
group are later encoded as polynomials. 

This secret sharing is used here in the same role as the t— 
out-of-T one. However if the t-out-of-T scheme is used, then 
it is impossible to choose the proper value of secrets (e.g., two 
matching, but different, words from Y, would have the same 
secret because of Rule[T|i. Secret shares [s7 T+1 , . . . s7[ 2 ' T ~ i+1 l] 
are chosen arbitrarily only to enable proper values of the 



secrets. To choose arbitrary secrets even for equal words (Y 
could be a multiset) (T — t + 1) new shares (the ones that 
are sent in plaintext) is exactly enough. The role of shares 
[s^ 1 , . . .Hi T ] is like in classical secret sharing. Because the 
last T — t + 1 shares are known, the first T shares work like 
a t-out-of-T secret sharing scheme. 

Subsequently, in step [4] the server creates T polynomials 
of degree n in such a way that evaluating a polynomial on 
a corresponding letter from some word from Y results in 
a corresponding secret share. Later he sends the encrypted 
polynomials to the client. The client evaluates the polynomials 
on his words and achieves {vf (where the following 
property holds: if xf = then vf = ~s^ w ). After the 
ticket phase, the client receives T values qf = vf + Ti w , 
where [77 1 , t7 2 , • • ■ t7 T ] are tickets - secret shares with the 
secret 0. Hence the client receives the group: [v\ + Tl 1 , vf + 
rf, . . . vf + n T ], where if xf = (f° r some Y m e Y) 
then vf = ~sTn W . Therefore, by the linear property of LSS, if 
vf is a correct secret share, then qf = vf + i% w is also a 
correct secret share. The client is trying to recover a secret 
for every received group of potential shares. However, for a 
proper reconstruction, he also needs shares that have been sent 
to him in plaintext by the server. These shares are always 
correct, but he needs to combine shares from the polynomial 
and ticket phases. Moreover, he does not know which shares 
from the polynomial phase correspond to the shares from the 
ticket phase. As a result, the client has to check all of the 
combinations (n 2 ). If the client combines non-fitting shares 
then he cannot recover the proper secret key (and therefore 
the proper word). 

Hence, for i,j € {1, ...n}, the client checks if he can 
reconstruct the secret key from the following shares: 

[qj, ql ... qf, 5-^+1 + -T+l ] _t +2+ -t +2> _ 
_ 2 . T _ t+ l + _ 2 . T _ t+1] _ 

If enough corresponding secret shares are in the group qi, 
then the secret that could be recovered from them is fc ||sA: m 
(because the secret of r shares is 0). Hence, in step [9] the 
client recovers all of the secret keys that he has corresponding 
shares of. 



